8 research outputs found

    Fast and Lean Immutable Multi-Maps on the JVM based on Heterogeneous Hash-Array Mapped Tries

    Get PDF
    An immutable multi-map is a many-to-many thread-friendly map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in static analysis of object-oriented systems. When processing such big data sets the memory overhead of the data structure encoding itself is a memory usage bottleneck. Motivated by reuse and type-safety, libraries for Java, Scala and Clojure typically implement immutable multi-maps by nesting sets as the values with the keys of a trie map. Like this, based on our measurements the expected byte overhead for a sparse multi-map per stored entry adds up to around 65B, which renders it unfeasible to compute with effectively on the JVM. In this paper we propose a general framework for Hash-Array Mapped Tries on the JVM which can store type-heterogeneous keys and values: a Heterogeneous Hash-Array Mapped Trie (HHAMT). Among other applications, this allows for a highly efficient multi-map encoding by (a) not reserving space for empty value sets and (b) inlining the values of singleton sets while maintaining a (c) type-safe API. We detail the necessary encoding and optimizations to mitigate the overhead of storing and retrieving heterogeneous data in a hash-trie. Furthermore, we evaluate HHAMT specifically for the application to multi-maps, comparing them to state-of-the-art encodings of multi-maps in Java, Scala and Clojure. We isolate key differences using microbenchmarks and validate the resulting conclusions on a real world case in static analysis. The new encoding brings the per key-value storage overhead down to 30B: a 2x improvement. With additional inlining of primitive values it reaches a 4x improvement

    Towards Zero-Overhead Disambiguation of Deep Priority Conflicts

    Full text link
    **Context** Context-free grammars are widely used for language prototyping and implementation. They allow formalizing the syntax of domain-specific or general-purpose programming languages concisely and declaratively. However, the natural and concise way of writing a context-free grammar is often ambiguous. Therefore, grammar formalisms support extensions in the form of *declarative disambiguation rules* to specify operator precedence and associativity, solving ambiguities that are caused by the subset of the grammar that corresponds to expressions. **Inquiry** Implementing support for declarative disambiguation within a parser typically comes with one or more of the following limitations in practice: a lack of parsing performance, or a lack of modularity (i.e., disallowing the composition of grammar fragments of potentially different languages). The latter subject is generally addressed by scannerless generalized parsers. We aim to equip scannerless generalized parsers with novel disambiguation methods that are inherently performant, without compromising the concerns of modularity and language composition. **Approach** In this paper, we present a novel low-overhead implementation technique for disambiguating deep associativity and priority conflicts in scannerless generalized parsers with lightweight data-dependency. **Knowledge** Ambiguities with respect to operator precedence and associativity arise from combining the various operators of a language. While *shallow conflicts* can be resolved efficiently by one-level tree patterns, *deep conflicts* require more elaborate techniques, because they can occur arbitrarily nested in a tree. Current state-of-the-art approaches to solving deep priority conflicts come with a severe performance overhead. **Grounding** We evaluated our new approach against state-of-the-art declarative disambiguation mechanisms. By parsing a corpus of popular open-source repositories written in Java and OCaml, we found that our approach yields speedups of up to 1.73x over a grammar rewriting technique when parsing programs with deep priority conflicts--with a modest overhead of 1-2 % when parsing programs without deep conflicts. **Importance** A recent empirical study shows that deep priority conflicts are indeed wide-spread in real-world programs. The study shows that in a corpus of popular OCaml projects on Github, up to 17 % of the source files contain deep priority conflicts. However, there is no solution in the literature that addresses efficient disambiguation of deep priority conflicts, with support for modular and composable syntax definitions

    Towards a Feature Model of Trie-Based Collections

    No full text
    <p>This archive contains a snapshot of a continuously evolving feature model description of the domain of trie-based collection data structures. The feature model is expressed in FDL, which is a Feature Description Language [1]. We use an extension of FDL that adds an integer data type (int) to the model. </p> <p>For convenience of viewing, the Rascal programming language (http://www.rascal-mpl.org) with its provided Eclipse (https://www.eclipse.org) environment supports syntax highlighting and experimental visualization of FDL diagrams.</p> <p>[1] van Deursen, A., & Klint, P. (2002). Domain-Specific Language Design Requires Feature Descriptions. Journal of Computing and Information Technology, 10(1), 1–17. http://doi.org/10.2498/cit.2002.01.01</p

    Code Specialization for Memory Efficient Hash Tries (Short Paper)

    Get PDF
    International audienceThe hash trie data structure is a common part in standard collectionlibraries of JVM programming languages such as Clojure andScala. It enables fast immutable implementations of maps, sets,and vectors, but it requires considerably more memory than anequivalent array-based data structure. This hinders the scalabilityof functional programs and the further adoption of this otherwiseattractive style of programming.In this paper we present a product family of hash tries. We generateJava source code to specialize them using knowledge of JVMobject memory layout. The number of possible specializations isexponential. The optimization challenge is thus to find a minimal setof variants which lead to a maximal loss in memory footprint on anygiven data. Using a set of experiments we measured the distributionof internal tree node sizes in hash tries. We used the results as aguidance to decide which variants of the family to generate andwhich variants should be left to the generic implementation.A preliminary validating experiment on the implementation ofsets and maps shows that this technique leads to a median decreaseof 55% in memory footprint for maps (and 78% for sets), whilestill maintaining comparable performance. Our combination of dataanalysis and code specialization proved to be effective

    Code specialization for memory efficient hash tries (short paper)

    No full text
    corecore